Bayesian modelling of compositional heterogeneity in molecular phylogenetics.

نویسندگان

  • Sarah E Heaps
  • Tom M W Nye
  • Richard J Boys
  • Tom A Williams
  • T Martin Embley
چکیده

In molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework. Specifically, the root and each branch of the tree is associated with its own composition vector whilst a global matrix of exchangeability parameters applies everywhere on the tree. We encourage borrowing of strength between branches by developing two possible priors for the composition vectors: one in which information can be exchanged equally amongst all branches of the tree and another in which more information is exchanged between neighbouring branches than between distant branches. We also propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference which uses data augmentation of substitutional histories to yield a simple complete data likelihood function that factorises over branches and allows Gibbs updates for most parameters. Standard phylogenetic models are not informative about the root position. Therefore a significant advantage of the proposed model is that it allows inference about rooted trees. The position of the root is fundamental to the biological interpretation of trees, both for polarising trait evolution and for establishing the order of divergence among lineages. Furthermore, unlike some other related models from the literature, inference in the model we propose can be carried out through a simple MCMC scheme which does not require problematic dimension-changing moves. We investigate the performance of the model and priors in analyses of two alignments for which there is strong biological opinion about the tree topology and root position.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of mitochondrial DNA sequences of Turcinoemacheilus genus (Nemacheilidae Cypriniformes) in Iran

Members of Nemacheilidae Family, Turcinoemacheilus genus were subjected to molecular phylogenetic analysis in this study. This genus was reported in 2009 to inhabit in Karoon River drainage, in contrary to previous assumption that it was the endemic species in the Basin of Tigris River. It was sampled from three stations placed in different tributaries in Karoon drainage and evaluated to unders...

متن کامل

SeqVis: Visualization of compositional heterogeneity in large alignments of nucleotides

UNLABELLED Most phylogenetic methods assume that the sequences evolved under homogeneous, stationary and reversible conditions. Compositional heterogeneity in data intended for studies of phylogeny suggests that the data did not evolve under these conditions. SeqVis, a Java application for analysis of nucleotide content, reads sequence alignments in several formats and plots the nucleotide cont...

متن کامل

Molecular phylogenetics of the allodapine bee genus Braunsapis: A-T bias and heterogeneous substitution parameters.

Extreme AT bias in Hymenopteran mitochondrial genes have created difficulties for molecular phylogenetic analyses, especially for older divergences where multiple substitutions can erode signal. Heterogeneity in the evolutionary rates of different codon positions and different genes also appears to have been a major problem in resolving ancient divergences in allodapine bees. Here we examine th...

متن کامل

Spatial modelling of zonality elements based on compositional nature of geochemical data using geostatistical approach: a case study of Baghqloom area, Iran

Due to the existence of a constant sum of constraints, the geochemical data is presented as the compositional data that has a closed number system. A closed number system is a dataset that includes several variables. The summation value of variables is constant, being equal to one. By calculating the correlation coefficient of a closed number system and comparing it with an open number system, ...

متن کامل

Temporal Predictions with Bayesian Compositional Hierarchies

In this note I describe a novel approach to modelling and exploiting probabilistic dependencies in compositional hierarchies for model-based scene interpretation. I present Bayesian Compositional Hierarchies (BCHs) which capture all probabilistic information about the objects of a compositional hierarchy in object-centered aggregate representations. BCHs extend typical Bayesian Network models b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Statistical applications in genetics and molecular biology

دوره 13 5  شماره 

صفحات  -

تاریخ انتشار 2014